Skip to main content

All Questions

Tagged with
0votes
0answers
39views

Keep training pytorch model on new data

I'm working on a text classification task and have decided to use a PyTorch model for this purpose. The process mainly involves the following steps: Load and process the text. Use a TF-IDF Vectorizer....
Simon's user avatar
1vote
1answer
907views

LLAMA MODEL WITHOUT USING HUGGINGFACE API

Is it possible to obtain the llama model alone as open source code without using the Huggingface API so that it can be hosted on our server?
Anagha M P's user avatar
1vote
1answer
389views

Text segmentation problem

I am new to ML and trying to solve problem of text segmentation. I have a transcript of news show and I want to split this transcript into parts by topic. I tried to google and asked chatgpt and found ...
Oleg Bovykin's user avatar
0votes
0answers
129views

On which texts should TfidfVectorizer be fitted when using TF-IDF cosine for text similarity?

I wonder on which texts should TfidfVectorizer be fitted when using TF-IDF cosine for text similarity. Should TfidfVectorizer be fitted on the texts that are analyzed for text similarity, or some ...
Franck Dernoncourt's user avatar
2votes
2answers
130views

In sklearn tfidf what is the difference between term frequecy and document frequency

Looking at the sklearn tfidf page: https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html and trying to understand the difference between term frequency ...
james pow's user avatar
3votes
4answers
2kviews

Accuracy is getting worse after text pre processing

I'm working a multi-class text classification project. After splitting the dataset into train and test datasets, I've applied the below function on the train dataset (AKA pre processing): ...
Ben's user avatar
  • 209
3votes
1answer
574views

Is there a way to map words to their synonyms in tfidf?

I have the following code: ...
james pow's user avatar
1vote
1answer
197views

Why is max_features ordered by term frequency instead of inverse document frequency

In the docs: https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html it is explained that max_features is ordered by ...
james pow's user avatar
0votes
1answer
233views

LinearSVC training time with CountVectorizer and HashingVectorizer

I am currently trying to build a text classifier and I am experimenting with different settings. Specifically, I am extracting my features with a CountVectorizer ...
ryuzakinho's user avatar
0votes
1answer
59views

Optimal clusters for K-means not clear - any ideas?

I have a toy dataset of 10,000 strings of people's names, addresses and birthdays. As a quirk of the data collection process it is highly likely there are duplicate people caused by typos and I am ...
Sandy Lee's user avatar
1vote
0answers
18views

What can be the approaches to merge (ensemble) a NON-Probabilistic model with RandomForest?

I have a RF for Text classification and it gives me accuracy. Almost same metric is given by another model built using ...
Deshwal's user avatar
4votes
1answer
1kviews

How to perform entity level train-val-test split for NER task?

A normal and stratified split option is provided by sklearn method that can be used for ML problems like multi-class classification. This is relatively easier to do as (1) one sample has one class, ...
Mohit's user avatar
0votes
3answers
156views

Creating numeric word representation of input sentences resulting in MemoryError

I am trying to use CountVectorizer to obtain word numerical word representation of data which is essentialy list of 160000 English sentences: ...
Mahesha999's user avatar
1vote
1answer
74views

Which algorithm is best for predicting diseases if symptoms are given? [closed]

After Topic modelling through LDA, I get the following dataset as result. ...
Atom Store's user avatar
3votes
1answer
1kviews

How to identify/recognize that a sentence about talks about future?

Brief Introduction: I have a report/paragraph in which there are sentences with reference to future plans/outlooks/expectations for a particular entity. I want to extract all such sentences for now. ...
Krs's user avatar
  • 31

153050per page
close